(1) Zoom in a bounding box [1] [2]
(2) Zoom in salient region [3] [4]
- relation to (1): if the salience region is rectangle and salience value is infinity, this should be equivalent to zooming in a bounding box.
- relation to pooling: weighted pooling with salience map as weight map
- relation to deformable CNN: use salience map to calculate offset for each position
Reference
[1] Fu, Jianlong, Heliang Zheng, and Tao Mei. “Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition.” CVPR, 2017.
[2] Zheng, Heliang, et al. “Learning multi-attention convolutional neural network for fine-grained image recognition.” ICCV, 2017.
[3] Recasens, Adria, et al. “Learning to zoom: a saliency-based sampling layer for neural networks.” ECCV, 2018.
[4] Zheng, Heliang, et al. “Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-grained Image Recognition.” arXiv preprint arXiv:1903.06150 (2019).